Overview

Dataset info

Number of variables8
Number of observations10000000
Missing cells9981283 (12.5%)
Duplicate rows221295 (2.2%)
Total size in memory610.4 MiB
Average record size in memory64.0 B

Variables types

Numeric4
Categorical2
Boolean1
Date0
URL0
Text (Unique)0
Rejected1
Unsupported0

Warnings

Dataset has 221295 (2.2%) duplicate rows Warning
attributed_time has a high cardinality: 15699 distinct values Warning
attributed_time has 9981283 (99.8%) missing values Missing
click_time only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
click_time has a high cardinality: 29943 distinct values Warning
os is highly correlated with device (ρ = 0.9682952226) Rejected

Variables

app
Numeric

Distinct count332
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean12.8596382
Minimum0
Maximum675
Zeros (%)< 0.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile2
Q13
Median12
Q315
95-th percentile26
Maximum675
Range675
Interquartile range12

Descriptive statistics

Standard deviation16.52679822
Coef of variation1.285168211
Kurtosis256.4815165
Mean12.8596382
MAD7.480779562
Skewness11.80146715
Sum128596382
Variance273.1350595
Memory size76.3 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[0.000e+00 5.000e-01 1.500e+00 2.500e+00 3.500e+00 ... 5.555e+02 5.565e+02 5.620e+02 5.635e+02 6.750e+02], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
12 1291185 12.9%
 
2 1202534 12.0%
 
15 1181585 11.8%
 
3 1170412 11.7%
 
9 966839 9.7%
 
18 917820 9.2%
 
14 507491 5.1%
 
1 391508 3.9%
 
8 364361 3.6%
 
21 223823 2.2%
 
Other values (322) 1782442 17.8%
 

Minimum 5 values

ValueCountFrequency (%) 
0 95 < 0.1%
 
1 391508 3.9%
 
2 1202534 12.0%
 
3 1170412 11.7%
 
4 1567 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
675 3 < 0.1%
 
651 3 < 0.1%
 
645 1 < 0.1%
 
629 1 < 0.1%
 
625 1 < 0.1%
 

attributed_time
Categorical

Distinct count15699
Unique (%)0.2%
Missing (%)99.8%
Missing (n)9981283
2017-11-06 23:36:23
 
6
2017-11-06 23:53:12
 
5
2017-11-06 23:28:43
 
5
Other values (15695)
 
18701
(Missing)
9981283
ValueCountFrequency (%) 
2017-11-06 23:36:23 6 < 0.1%
 
2017-11-06 23:53:12 5 < 0.1%
 
2017-11-06 23:28:43 5 < 0.1%
 
2017-11-07 00:12:00 5 < 0.1%
 
2017-11-06 23:55:44 5 < 0.1%
 
2017-11-07 00:03:48 5 < 0.1%
 
2017-11-07 00:02:07 5 < 0.1%
 
2017-11-06 16:14:02 4 < 0.1%
 
2017-11-07 00:04:11 4 < 0.1%
 
2017-11-06 23:37:42 4 < 0.1%
 
Other values (15688) 18669 0.2%
 
(Missing) 9981283 99.8%
 
Max length19
Mean length3.0299472
Min length3
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

channel
Numeric

Distinct count170
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean252.6604048
Minimum0
Maximum498
Zeros (%)< 0.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile107
Q1134
Median237
Q3377
95-th percentile477
Maximum498
Range498
Interquartile range243

Descriptive statistics

Standard deviation130.0375702
Coef of variation0.5146733233
Kurtosis-1.037421756
Mean252.6604048
MAD109.1975162
Skewness0.4725883333
Sum2526604048
Variance16909.76966
Memory size76.3 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 0. 1.5 3.5 9. 14. ... 488.5 492.5 496.5 497.5 498. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
245 793105 7.9%
 
134 630888 6.3%
 
259 469845 4.7%
 
477 412559 4.1%
 
121 402226 4.0%
 
107 388035 3.9%
 
145 348862 3.5%
 
153 296832 3.0%
 
205 279720 2.8%
 
178 269720 2.7%
 
Other values (160) 5708208 57.1%
 

Minimum 5 values

ValueCountFrequency (%) 
0 45 < 0.1%
 
3 77703 0.8%
 
4 7 < 0.1%
 
5 6 < 0.1%
 
13 1357 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
498 15 < 0.1%
 
497 24556 0.2%
 
496 553 < 0.1%
 
489 119261 1.2%
 
488 1208 < 0.1%
 

click_time
Categorical

Distinct count29943
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
2017-11-06 16:05:10
 
1261
2017-11-06 16:05:12
 
1220
2017-11-06 16:05:11
 
1206
Other values (29940)
9996313
ValueCountFrequency (%) 
2017-11-06 16:05:10 1261 < 0.1%
 
2017-11-06 16:05:12 1220 < 0.1%
 
2017-11-06 16:05:11 1206 < 0.1%
 
2017-11-06 16:05:09 1198 < 0.1%
 
2017-11-06 16:05:15 1197 < 0.1%
 
2017-11-06 16:05:14 1194 < 0.1%
 
2017-11-06 16:00:45 1187 < 0.1%
 
2017-11-06 16:05:24 1176 < 0.1%
 
2017-11-06 16:01:06 1174 < 0.1%
 
2017-11-06 16:00:43 1173 < 0.1%
 
Other values (29933) 9988014 99.9%
 
Max length19
Mean length19
Min length19
Contains charsFalse
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

device
Numeric

Distinct count940
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean33.0387117
Minimum0
Maximum3545
Zeros (%)0.5%
Mini histogram

Quantile statistics

Minimum0
5-th percentile1
Q11
Median1
Q31
95-th percentile2
Maximum3545
Range3545
Interquartile range0

Descriptive statistics

Standard deviation308.8297662
Coef of variation9.347512366
Kurtosis90.16965418
Mean33.0387117
MAD63.26048567
Skewness9.596435052
Sum330387117
Variance95375.82448
Memory size76.3 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[0.0000e+00 5.0000e-01 1.5000e+00 3.0000e+00 5.0000e+00 ... 3.0325e+03 3.0345e+03 3.1590e+03 3.1660e+03 3.5450e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 9381146 93.8%
 
2 456617 4.6%
 
3032 104393 1.0%
 
0 46476 0.5%
 
59 1618 < 0.1%
 
40 462 < 0.1%
 
6 458 < 0.1%
 
16 334 < 0.1%
 
18 247 < 0.1%
 
33 204 < 0.1%
 
Other values (930) 8045 0.1%
 

Minimum 5 values

ValueCountFrequency (%) 
0 46476 0.5%
 
1 9381146 93.8%
 
2 456617 4.6%
 
4 60 < 0.1%
 
6 458 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
3545 1 < 0.1%
 
3537 1 < 0.1%
 
3527 1 < 0.1%
 
3525 1 < 0.1%
 
3524 1 < 0.1%
 

ip
Numeric

Distinct count68740
Unique (%)0.7%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean87331.72281
Minimum9
Maximum212774
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum9
5-th percentile6976
Q142164
Median81973
Q3121187
95-th percentile193521
Maximum212774
Range212765
Interquartile range79023

Descriptive statistics

Standard deviation55675.27388
Coef of variation0.6375148925
Kurtosis-0.681793417
Mean87331.72281
MAD46048.22713
Skewness0.4248442712
Sum8.733172281e+11
Variance3099736122
Memory size76.3 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[9.000000e+00 9.500000e+00 1.450000e+01 1.950000e+01 2.250000e+01 ... 2.127545e+05 2.127585e+05 2.127690e+05 2.127720e+05 2.127740e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
73516 51711 0.5%
 
73487 51215 0.5%
 
5314 35073 0.4%
 
5348 35004 0.4%
 
53454 25381 0.3%
 
105560 23289 0.2%
 
100275 23070 0.2%
 
114276 22774 0.2%
 
201182 22719 0.2%
 
105475 22047 0.2%
 
Other values (68730) 9687717 96.9%
 

Minimum 5 values

ValueCountFrequency (%) 
9 167 < 0.1%
 
10 89 < 0.1%
 
19 30 < 0.1%
 
20 230 < 0.1%
 
25 64 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
212774 39 < 0.1%
 
212773 182 < 0.1%
 
212771 14 < 0.1%
 
212767 1 < 0.1%
 
212761 6 < 0.1%
 

is_attributed
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
0
9981283
1
 
18717
ValueCountFrequency (%) 
0 9981283 99.8%
 
1 18717 0.2%
 

os
Highly correlated

This variable is highly correlated with device and should be ignored for analysis

Correlation0.9682952226

Correlations

Missing values

Sample

First rows

appattributed_timechannelclick_timedeviceipis_attributedos
03NaN3792017-11-06 14:32:21183230013
13NaN3792017-11-06 14:33:34117357019
23NaN3792017-11-06 14:34:12135810013
314NaN4782017-11-06 14:34:52145745013
43NaN3792017-11-06 14:35:081161007013
53NaN3792017-11-06 14:36:26118787016
63NaN3792017-11-06 14:37:441103022023
73NaN3792017-11-06 14:37:591114221019
83NaN3792017-11-06 14:38:101165970013
964NaN4592017-11-06 14:38:23174544022

Last rows

appattributed_timechannelclick_timedeviceipis_attributedos
999999015NaN3152017-11-07 00:12:03151438018
999999118NaN1072017-11-07 00:12:0315586808
999999225NaN2592017-11-07 00:12:03143351019
999999314NaN2082017-11-07 00:12:03153454019
99999943NaN3712017-11-07 00:12:0319094037
999999521NaN1282017-11-07 00:12:03164609011
999999612NaN1782017-11-07 00:12:03146277015
999999720NaN2592017-11-07 00:12:031149939019
99999983NaN2802017-11-07 00:12:03123256027
999999915NaN2782017-11-07 00:12:031146470010